Skip to content

Query for less data on pagination count#3106

Open
rhysyngsun wants to merge 3 commits intomainfrom
nl/pagination-tweaks
Open

Query for less data on pagination count#3106
rhysyngsun wants to merge 3 commits intomainfrom
nl/pagination-tweaks

Conversation

@rhysyngsun
Copy link
Copy Markdown
Contributor

@rhysyngsun rhysyngsun commented Mar 25, 2026

What are the relevant tickets?

Description (What does it do?)

  • Refactors our pagination to have just 2 main implementations.
  • Sets pagination class default and opts-out noncompliant views explicitly to maintain their existing interfaces.
  • Updates the default pagination implementation to only query for the pk by default (this can be overridden if needed by subclassing).

The effect of this to turn queries like this that DRF previously ran to determine count (this example is for the /api/v1/featured/ endpoint):

SELECT COUNT(*)
FROM (
  SELECT DISTINCT "learning_resources_learningresource"."id" AS "col1",
    "learning_resources_learningresource"."created_on" AS "col2",
    "learning_resources_learningresource"."updated_on" AS "col3",
    "learning_resources_learningresource"."readable_id" AS "col4",
    "learning_resources_learningresource"."title" AS "col5",
    "learning_resources_learningresource"."description" AS "col6",
    "learning_resources_learningresource"."full_description" AS "col7",
    "learning_resources_learningresource"."last_modified" AS "col8",
    "learning_resources_learningresource"."published" AS "col9",
    "learning_resources_learningresource"."languages" AS "col10",
    "learning_resources_learningresource"."url" AS "col11",
    "learning_resources_learningresource"."image_id" AS "col12",
    "learning_resources_learningresource"."platform_id" AS "col13",
    "learning_resources_learningresource"."certification" AS "col14",
    "learning_resources_learningresource"."certification_type" AS "col15",
    "learning_resources_learningresource"."resource_type" AS "col16",
    "learning_resources_learningresource"."resource_category" AS "col17",
    "learning_resources_learningresource"."ocw_topics" AS "col18",
    "learning_resources_learningresource"."offered_by_id" AS "col19",
    "learning_resources_learningresource"."etl_source" AS "col20",
    "learning_resources_learningresource"."professional" AS "col21",
    "learning_resources_learningresource"."next_start_date" AS "col22",
    "learning_resources_learningresource"."prices" AS "col23",
    "learning_resources_learningresource"."availability" AS "col24",
    "learning_resources_learningresource"."completeness" AS "col25",
    "learning_resources_learningresource"."delivery" AS "col26",
    "learning_resources_learningresource"."license_cc" AS "col27",
    "learning_resources_learningresource"."test_mode" AS "col28",
    "learning_resources_learningresource"."continuing_ed_credits" AS "col29",
    "learning_resources_learningresource"."pace" AS "col30",
    "learning_resources_learningresource"."format" AS "col31",
    "learning_resources_learningresource"."location" AS "col32",
    "learning_resources_learningresource"."duration" AS "col33",
    "learning_resources_learningresource"."min_weeks" AS "col34",
    "learning_resources_learningresource"."max_weeks" AS "col35",
    "learning_resources_learningresource"."time_commitment" AS "col36",
    "learning_resources_learningresource"."min_weekly_hours" AS "col37",
    "learning_resources_learningresource"."max_weekly_hours" AS "col38",
    "learning_resources_learningresource"."require_summaries" AS "col39", COUNT(
    "learning_resources_learningresourceviewevent"."id") AS "_views_count",
    "learning_resources_learningresourcerelationship"."position" AS "position"
  FROM "learning_resources_learningresource"
  LEFT OUTER JOIN "learning_resources_learningresourceviewevent" ON (
    "learning_resources_learningresource"."id" =
    "learning_resources_learningresourceviewevent"."learning_resource_id")
  INNER JOIN "learning_resources_learningresourcerelationship" ON (
    "learning_resources_learningresource"."id" =
    "learning_resources_learningresourcerelationship"."child_id")
  WHERE (
    "learning_resources_learningresourcerelationship"."parent_id" IN (
      SELECT U0."featured_list_id"
      FROM "channels_channel" U0
      WHERE U0."channel_type" = 'unit'
    ) AND "learning_resources_learningresource"."published"
  )

into a query like this:

SELECT COUNT(*) 
FROM 
  (
    SELECT 
      DISTINCT "learning_resources_learningresource"."id" AS "col1", 
      "learning_resources_learningresourcerelationship"."position" AS "position" 
    FROM "learning_resources_learningresource" INNER JOIN "learning_resources_learningresourcerelationship" ON (
        "learning_resources_learningresource"."id" = "learning_resources_learningresourcerelationship"."child_id"
      ) 
    WHERE 
      (
        "learning_resources_learningresourcerelationship"."parent_id" IN (
          SELECT 
            U0."featured_list_id" 
          FROM "channels_channel" U0 
          WHERE U0."channel_type" = 'unit'
        ) 
        AND "learning_resources_learningresource"."published"
      )
  ) subquery;

The difference in performance between these two specific queries is ~1000x faster, going from ~500ms for the version currently in main to 0.3ms for this branch. I don't expect to see exactly the same improvement across the board, but there should be some kind of improvement because the count query won't be going to disk as much for data and ideally just hits a few indices now.

How can this be tested?

The app should still function. You likely won't see a huge difference in performance locally unless you have production-scale data around learning resources.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 25, 2026

OpenAPI Changes

No detectable change.

View full changelog

Unexpected changes? Ensure your branch is up-to-date with main (consider rebasing).

@rhysyngsun rhysyngsun changed the title Nl/pagination tweaks Query for less data on pagination count Mar 25, 2026
@rhysyngsun rhysyngsun force-pushed the nl/pagination-tweaks branch 2 times, most recently from d604868 to 9731e2c Compare March 27, 2026 20:37
@rhysyngsun rhysyngsun marked this pull request as ready for review March 27, 2026 20:49
@arslanashraf7
Copy link
Copy Markdown
Contributor

@zamanafzal could you review this?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR centralizes DRF pagination behavior and optimizes pagination count queries by counting against a reduced field set (defaulting to pk) to avoid expensive COUNT(DISTINCT <many columns>) subqueries on complex querysets.

Changes:

  • Introduces main.pagination.DefaultPagination (and LargePagination) and sets it as the global DRF DEFAULT_PAGINATION_CLASS.
  • Removes per-app pagination classes/overrides in favor of the global default, and explicitly opts out (pagination_class = None) for endpoints that must preserve non-paginated responses.
  • Adjusts the learning-resources summary action queryset to avoid incompatible select_related() behavior and reduce selected columns.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
video_shorts/views.py Switches video shorts pagination to subclass the new global default pagination.
profiles/views.py Explicitly disables pagination for the staff user admin endpoint to preserve its response shape.
news_events/views.py Removes the app-local pagination implementation and relies on the global default pagination.
main/settings.py Sets DRF DEFAULT_PAGINATION_CLASS to main.pagination.DefaultPagination.
main/pagination.py Adds centralized DefaultPagination/LargePagination and overrides get_count() to count with fewer selected fields.
learning_resources_search/views.py Disables pagination for the user subscription endpoint to preserve its response shape.
learning_resources/views.py Removes local pagination classes, adopts centralized pagination, and updates the summary action queryset.
channels/views.py Relies on global pagination for channel lists and disables pagination on non-list-like endpoints.
articles/views.py Removes the app-local pagination implementation and relies on the global default pagination.

"""
queryset = self.filter_queryset(
self.get_queryset().values("id", "last_modified")
# we don't use `self.get_queryset()` here because there are incomplatible
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "incomplatible" should be "incompatible".

Suggested change
# we don't use `self.get_queryset()` here because there are incomplatible
# we don't use `self.get_queryset()` here because there are incompatible

Copilot uses AI. Check for mistakes.
)
from main.constants import VALID_HTTP_METHODS
from main.filters import MultipleOptionsFilterBackend
from main.pagination import LargePagination
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing DefaultPagination/LargePagination from learning_resources.views breaks existing imports (e.g., testimonials/views.py imports LargePagination from learning_resources.views). Update those imports/usages to point to main.pagination (or re-export from learning_resources.views) to avoid ImportError at runtime.

Suggested change
from main.pagination import LargePagination
from main.pagination import DefaultPagination, LargePagination

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to make this change at

from learning_resources.views import LargePagination

# we don't use `self.get_queryset()` here because there are incomplatible
# `select_related()` invocations and we don't need related data anyway
LearningResource.objects.filter(published=True).only("id", "last_modified")
)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The summary action now builds its own base queryset without .distinct(). Because LearningResource filters include many-to-many relations (e.g., topics/departments), filtering can introduce duplicate rows unless the queryset is made distinct. Consider applying .distinct() after filter_queryset (or otherwise ensuring uniqueness) to preserve correct counts/results.

Suggested change
)
).distinct()

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@zamanafzal zamanafzal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhysyngsun Thanks for the PR. The pagination count optimisation looks useful. I've left a few comments on the PR. Do you think they make sense?

)
from main.constants import VALID_HTTP_METHODS
from main.filters import MultipleOptionsFilterBackend
from main.pagination import LargePagination
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to make this change at

from learning_resources.views import LargePagination

self.get_queryset().values("id", "last_modified")
# we don't use `self.get_queryset()` here because there are incomplatible
# `select_related()` invocations and we don't need related data anyway
LearningResource.objects.filter(published=True).only("id", "last_modified")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to LearningResourceViewSet.summary() appear to bypass self.get_queryset() and the original deduplication logic. Since summary is used for sitemaps, could we add a test (or at least do a manual check) to ensure the count and the returned results stay consistent with the main list endpoint for the same filters (i.e., no duplicate-driven count inflation or paging gaps)?

"DEFAULT_AUTHENTICATION_CLASSES": (
"rest_framework.authentication.SessionAuthentication",
),
"DEFAULT_PAGINATION_CLASS": "main.pagination.DefaultPagination",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're now setting DEFAULT_PAGINATION_CLASS in settings, some endpoints might switch from returning a plain list ([]) to a full paginated response ({count, next, previous, results}).
Even with the pagination_class = None you added in a few places, do you think we should do a quick audit of all list endpoints to make sure we haven't accidentally changed the response shape for any frontend consumers?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(doesn't hurt to check, but this is the sort of thing our OpenAPI CI check should actualy be very good at catching)

Copy link
Copy Markdown
Member

@mbertrand mbertrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, agree with other comments about:

  • adding distinct() to the LearnignResourceViewSet summary view and tests to make sure counts are as expected with no dupes
  • updating the LargePagination import for testimonials

@rhysyngsun rhysyngsun force-pushed the nl/pagination-tweaks branch from 62fb2f7 to 03f664d Compare March 30, 2026 16:37
Copy link
Copy Markdown
Contributor

@zamanafzal zamanafzal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhysyngsun The changes look good to me. The only thing missing is tests to verify the LearningResourceViewSet summary view to make sure counts are as expected. If you think tests aren’t necessary, then we’re good to proceed with these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants